Goto

Collaborating Authors

 backward weight



Convergence and Alignment of Gradient Descent with Random Backpropagation Weights Ganlin Song Ruitu Xu John Lafferty Department of Statistics and Data Science

Neural Information Processing Systems

Stochastic gradient descent with backpropagation is the workhorse of artificial neural networks. It has long been recognized that backpropagation fails to be a biologically plausible algorithm. Fundamentally, it is a non-local procedure-- updating one neuron's synaptic weights requires knowledge of synaptic weights or receptive fields of downstream neurons. This limits the use of artificial neural networks as a tool for understanding the biological principles of information processing in the brain. Lillicrap et al. (2016) propose a more biologically plausible "feedback alignment" algorithm that uses random and fixed backpropagation weights, and show promising simulations. In this paper we study the mathematical properties of the feedback alignment procedure by analyzing convergence and alignment for two-layer networks under squared error loss. In the overparameter-ized setting, we prove that the error converges to zero exponentially fast, and also that regularization is necessary in order for the parameters to become aligned with the random backpropagation weights. Simulations are given that are consistent with this analysis and suggest further generalizations. These results contribute to our understanding of how biologically plausible algorithms might carry out weight learning in a manner different from Hebbian learning, with performance that is comparable with the full non-local backpropagation algorithm.


Review for NeurIPS paper: Biological credit assignment through dynamic inversion of feedforward networks

Neural Information Processing Systems

As the authors note, the stability of the feedback dynamics depends on a condition on the eigenvalues of WB - alpha*I. Without it, the feedback dynamics will yield unpredictable results and presumably not perform effective credit assignment. This condition is extremely unlikely to be satisfied generically, and is essentially the analog of sign-symmetry in forward and backward weights when one considers pseudoinverses rather than transposes. The authors manually enforce that it be satisfied at initialization, and manually adjust the backward weights if the condition is violated during training. These manual initialization choices and adjustments are doing much of the work of credit assignment in the authors' algorithm -- I can't tell from the results as presented how helpful the dynamic inversion really is.


Learning efficient backprojections across cortical hierarchies in real time

Max, Kevin, Kriener, Laura, García, Garibaldi Pineda, Nowotny, Thomas, Senn, Walter, Petrovici, Mihai A.

arXiv.org Artificial Intelligence

Models of sensory processing and learning in the cortex need to efficiently assign credit to synapses in all areas. In deep learning, a known solution is error backpropagation, which however requires biologically implausible weight transport from feed-forward to feedback paths. We introduce Phaseless Alignment Learning (PAL), a bio-plausible method to learn efficient feedback weights in layered cortical hierarchies. This is achieved by exploiting the noise naturally found in biophysical systems as an additional carrier of information. In our dynamical system, all weights are learned simultaneously with always-on plasticity and using only information locally available to the synapses. Our method is completely phase-free (no forward and backward passes or phased learning) and allows for efficient error propagation across multi-layer cortical hierarchies, while maintaining biologically plausible signal transport and learning. Our method is applicable to a wide class of models and improves on previously known biologically plausible ways of credit assignment: compared to random synaptic feedback, it can solve complex tasks with less neurons and learn more useful latent representations. We demonstrate this on various classification tasks using a cortical microcircuit model with prospective coding.


Investigating the Scalability and Biological Plausibility of the Activation Relaxation Algorithm

Millidge, Beren, Tschantz, Alexander, Seth, Anil, Buckley, Christopher L

arXiv.org Artificial Intelligence

The recently proposed Activation Relaxation (AR) algorithm provides a simple and robust approach for approximating the backpropagation of error algorithm using only local learning rules. We have previously shown that the algorithm can be further simplified and made more biologically plausible by (i) introducing a learnable set of backwards weights, which overcomes the weight-transport problem, and (ii) avoiding the computation of nonlinear derivatives at each neuron. However, tthe efficacy of these simplifications has, so far, only been tested on simple multi-layer-perceptron (MLP) networks. Here, we show that these simplifications still maintain performance using more complex CNN architectures and challenging datasets, which have proven difficult for other biologically-plausible schemes to scale to. We also investigate whether another biologically implausible assumption of the original AR algorithm - the frozen feedforward pass - can be relaxed without damaging performance. The backpropagation of error algorithm (backprop) has been the engine driving the successes of modern machine learning with deep neural networks.


Relaxing the Constraints on Predictive Coding Models

Millidge, Beren, Tschantz, Alexander, Seth, Anil, Buckley, Christopher L

arXiv.org Artificial Intelligence

Predictive coding is an influential theory of cortical function which posits that the principal computation the brain performs, which underlies both perception and learning, is the minimization of prediction errors. While motivated by high-level notions of variational inference, detailed neurophysiological models of cortical microcircuits which can implements its computations have been developed. Moreover, under certain conditions, predictive coding has been shown to approximate the backpropagation of error algorithm, and thus provides a relatively biologically plausible credit-assignment mechanism for training deep networks. However, standard implementations of the algorithm still involve potentially neurally implausible features such as identical forward and backward weights, backward nonlinear derivatives, and 1-1 error unit connectivity. In this paper, we show that these features are not integral to the algorithm and can be removed either directly or through learning additional sets of parameters with Hebbian update rules without noticeable harm to learning performance. Our work thus relaxes current constraints on potential microcircuit designs and hopefully opens up new regions of the design-space for neuromorphic implementations of predictive coding.


Activation Relaxation: A Local Dynamical Approximation to Backpropagation in the Brain

Millidge, Beren, Tschantz, Alexander, Seth, Anil K, Buckley, Christopher L

arXiv.org Artificial Intelligence

Can the powerful backpropagation of error (backprop) reinforcement learning algorithm be formulated in a manner suitable for implementation in neural circuitry? The primary challenge is to ensure that any candidate formulation uses only local information, rather than relying on global (error) signals, as in orthodox backprop. Recently several algorithms for approximating backprop using only local signals, such as predictive coding and equilibrium-prop, have been proposed. However, these algorithms typically impose other requirements which challenge biological plausibility: for example, requiring complex and precise connectivity schemes (predictive coding), or multiple sequential backwards phases with information being stored across phases (equilibrium-prop). Here, we propose a novel local algorithm, Activation Relaxation (AR), which is motivated by constructing the backpropagation gradient as the equilibrium point of a dynamical system. Our algorithm converges robustly and exactly to the correct backpropagation gradients, requires only a single type of neuron, utilises only a single backwards phase, and can perform credit assignment on arbitrary computation graphs. We illustrate these properties by training deep neural networks on visual classification tasks, and we describe simplifications to the algorithm which remove further obstacles to neurobiological implementation (for example, the weight-transport problem, and the use of nonlinear derivatives), while preserving performance.


Extension of Direct Feedback Alignment to Convolutional and Recurrent Neural Network for Bio-plausible Deep Learning

Han, Donghyeon, Park, Gwangtae, Ryu, Junha, Yoo, Hoi-jun

arXiv.org Machine Learning

Throughout this paper, we focus on the improvement of the direct feedback alignment (DFA) algorithm and extend the usage of the DFA to convolutional and recurrent neural networks (CNNs and RNNs). Even though the DFA algorithm is biologically plausible and has a potential of high-speed training, it has not been considered as the substitute for back-propagation (BP) due to the low accuracy in the CNN and RNN training. In this work, we propose a new DFA algorithm for BP-level accurate CNN and RNN training. Firstly, we divide the network into several modules and apply the DFA algorithm within the module. Second, the DFA with the sparse backward weight is applied. It comes with a form of dilated convolution in the CNN case, and in a form of sparse matrix multiplication in the RNN case. Additionally, the error propagation method of CNN becomes simpler through the group convolution. Finally, hybrid DFA increases the accuracy of the CNN and RNN training to the BP-level while taking advantage of the parallelism and hardware efficiency of the DFA algorithm.


Two Routes to Scalable Credit Assignment without Weight Symmetry

Kunin, Daniel, Nayebi, Aran, Sagastuy-Brena, Javier, Ganguli, Surya, Bloom, Jon, Yamins, Daniel L. K.

arXiv.org Machine Learning

The neural plausibility of backpropagation has long been disputed, primarily for its use of non-local weight transport - the biologically dubious requirement that one neuron instantaneously measure the synaptic weights of another. Until recently, attempts to create local learning rules that avoid weight transport have typically failed in the large-scale learning scenarios where backpropagation shines, e.g. ImageNet categorization with deep convolutional networks. Here, we investigate a recently proposed local learning rule that yields competitive performance with backpropagation and find that it is highly sensitive to metaparameter choices, requiring laborious tuning that does not transfer across network architecture. Our analysis indicates the underlying mathematical reason for this instability, allowing us to identify a more robust local learning rule that better transfers without metaparameter tuning. Nonetheless, we find a performance and stability gap between this local rule and backpropagation that widens with increasing model depth. We then investigate several non-local learning rules that relax the need for instantaneous weight transport into a more biologically-plausible "weight estimation" process, showing that these rules match state-of-the-art performance on deep networks and operate effectively in the presence of noisy updates. Taken together, our results suggest two routes towards the discovery of neural implementations for credit assignment without weight symmetry: further improvement of local rules so that they perform consistently across architectures and the identification of biological implementations for non-local learning mechanisms.